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Sir: 

I, Satoshi Ishiiy a citizen of Japan, hoeby declare and state the following: 

1 . I graduated from Tokyo Institute of Tcclmology, Japan in 1 988 with a BS degree 
in Chemistry and was awarded a PhD degree in Biochemistry in 1997 from The Univensity of 
Tokyo, Japan. 

2. Since 2000, 1 have been employed by The University of Tokyo, Japan where my 
present title is Associate Professor. During my employment therein, I have conducted 
researdi on O-piotein coupled recurs for. bioaetive l^ids. 

3. I am the author of the following publications: 

Noguchi, K., S. Ishii, and T. Shimizu. 2003. Identification of p2y9/GPR23 as a novel G 
protein-coupled receptor for lysophosphatidic acid, structurally distant from the Edg family. / 
Biol Chem 278:25600-25606. 

4. I have read and am familiar with the above-identified patent application as well 
as the Official Action dated November 4, 2009, in the application. 
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5, Under my sbp^viaon and control, 1 conducted ocpeiiments to obtain data 
lowing diat mouse homolog of hnman p2y9, also known as GPR23, bound LPA. The 
experiments were performed using the naouse p2y9/GPR23 (Acc. No. AK 04528?) based on 
the following protocol: 

RH7777 cells were cultured on coUagen-coated dishes in Dulbecco's modified 
Eagle's medium (DMEM: Sigma) supplemented with 10% fetal bovine serum (Cambrex 
Company, Walkersville, MD). Cells were transfected with either pCXN2.1-mouse 
p2y9/QPR23 or pCXN2.1 empty vector usmg Lipofectamine 2000 reagent (Invitrogen, 
Carlsbad, CA). After 24 h of transfectjon, cells were washed with pho^hate-buffered 
saUne three times and soum-starved for 24 h in DMEM sojiplemented with 0.1% BSA. The 
cdls were washed again with phosphate-buffered salme twice and sar^jed oS. After jfiirdier 
washing with binding buffer (25 naM HEPES-NaOH (pH 7.4), 10 mM MgCii and 0.25 M 
sucrose), the cells were suspended m the buffer wilh additional 20 jiM APMSF (Sigma) and a 
protease inhibitor cocktail (Complete; Roche, Basel, Switzerland), sonicated three times at 15 
watts for 30 s, and centrifuged at 800 x g for 10 min at 4''C. The supernatant was fiirthw 
centrifuged at 100,000 x g for 60 min at 4"C, and fte resultimt pellet was homogeaized in 
ice-cold binding buffer. Binding assays were performed in 96-well plates in triplicates. 20 
Jig of the membrane fractions were incubated in binding buffer containing 0.25% BSA with 
wrious concentrations of ['H|-LPA for 60 min at 4*C. The bound ['HJ-LPA was collected 
onto a Unifilter-96-GF/C (PerkinEhner) using a MicroMate 196 harvester (PerkinElmer). 
The filter was then rinsed ten times vnth binding buffer contauung 0.25% BSA and dried for 
2 h at 50*C. 25 nl of MiaoScint-O scintillation cocktail <P«kinEhner) was added per well. 
The radioactivity that remained on the filter was measured with a TopCount microplate 
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scintillation counter (PerkinElrner). Total and nonspecific bindings were evaluated in the 
absence and presence of 10 \M unlabeled 18:1-LPA, req)ectiveiy. The specific binding 
value (dpm) was calculated by subtracting the nonspecific binding value (dpm) from the total 
binding value (dpm), 

6. The e>q)«iments yielded the following results: 



150 



c 

50^ 



• mouse QPR23 (total) 

□ mouse GPR23 (nonspeciflc) 

A mock (total) 

A mock(nonsp6cilte), 




pHl-LPA{nM) 




r Hl-LPA binding to RH7777 cell membranes transiently expressing mouse 
GPR23. ^.|^HI-LPA binding to mouse GPR23. Membrane fractions of RH7777 cells 
transiently expressing mouse OPR23 (sguare^) and mock-tmi«fected cells (mang!es) 
were Incubated with increasing concentrations of pH|-l8:l-LPA in the presence or 
absence of IO//M unlabeled i8;l-LPA. Total binding (closed symbols) Rndmn$pec]fic 
binding {opeti symbols) are shown. Dam are means ± S.D. in « 3). Scatchard analysis of 
(he specific binding of t^Hj-LPA to mouse GPR23. 
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7. From the attached experimental results, I have concluded, among other things, that 
mouse p2y9/GPR23, which has a 98.1% homology to human p2y9/GPR23, binds LPA, As 
such, I have concluded that allele variants with at least 95% homology to human 
p2y 9/GPR23 wiU bind LPA. 

The undersigned declares diat all statements made herein of his own knowledge are 

true, and that all statements made on inforanation and belief are believed to be true; and 

further that these statements were made with the knowledge that willful false statements and 

the like so made are punishable by fine or imprisonment, or both, under §1001 of Title 18 of 

the United 

States Code and that willful false statements may jeopardize the validity of the application or 
any patent issued thereon. 




Satoshi Ishii 



Signed this 27th day of March, 2009 
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Abstract 



Background: G-protein-coupled receptors (GPCRs) are the largest and most diverse family of 
transmembrane receptors. They respond to a wide range of stimuli, including small peptides, lipid 
analogs, amino-acid derivatives, and sensory stimuli such as light, taste and odor, and transmit 
signals to the interior of the cell through interaction with heterotrimeric G proteins. A large 
number of putative GPCRs have no identified natural ligand. We hypothesized that a more 
complete knowledge of the phylogenetic relationship of these orphan receptors to receptors 
with known Iigands could facilitate ligand identification, as related receptors often have Iigands 
with similar structural features. 

Results: A database search excluding olfactory and gustatory receptors was used to compile a 
list of accession numbers and synonyms of 81 orphan and 196 human GPCRs with known Iigands. 
Of these, 241 sequences belonging to the rhodopsin receptor-like family A were aligned and a 
tentative phylogenetic tree constructed by neighbor joining. This tree and local alignment tools 
were used to define 19 subgroups of family A small enough for more accurate maximum- 
likelihood analyses. The secretin receptor-like family B and metabotropic glutamate receptor-like 
family C were directly subjected to these methods. 

Conclusions: Our trees show the overall relationship of 277 GPCRs with emphasis on orphan 
receptors. Support values are given for each branch. This approach may prove valuable for 
identification of the natural Iigands of orphan receptors as their relation to receptors with known 
Iigands becomes more evident. 



Background 

G-protein-coupled receptors (GPCRs) are the largest and 
most diverse family of transmembrane receptors. They 
respond to a wide range of stimuli including small peptides, 
lipid analogs, amino-acid derivatives, and sensory stimuli 
such as light, taste and odor [1], and transmit signals to the 
interior of the cell through interaction with heterotrimeric G 



proteins. Certain amino-acid residues of this receptor family 
are well conserved and approaches exploiting this, such as 
low-stringency hybridization and degenerate PGR, have been 
used to clone new members of this large superfamily [2-4]. 
Many of these putative receptors share GPCR structural 
motifs, but still lack a defined physiologically relevant ligand. 
One strategy to identify the natural ligand of these so-called 
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orphan receptors uses changes in second -messenger 
activation in cells stably expressing the receptor in response 
to tissue extracts expected to contain the natural ligand [5]. 
In a second step, these extracts are tested and fractionated to 
purity, before being analyzed by mass spectrometry. This 
strategy led to the identification of several novel bioactive 
peptides or peptide families (for review see [6]). The identifi- 
cation of these natural ligands is likely to give further insight 
into the physiological role of these receptors and advance the 
design of pharmacologically active receptor agonists or 
antagonists. This is of particular interest, as GPCRs are the 
most targeted protein superfamily in pharmaceutical 
research [7]. Better prediction of the presumed chemical 
class or structure of the ligand facilitates the identification of 
orphan receptors by the strategy described above, as the 
ligand purification process can be tailored more specifically 
to the assumed class of substances. 

Phylogenetic analysis of receptor relationships has already 
been used to elucidate the chemical nature of receptor 
ligands. The identification of sphingosine i-phosphate as the 
ligand for the GPCR EDG-i led to the prediction that EDG-3, 
EDG-5, EDG-6 and EDG-8 have the same ligand [8-11]. In 
contrast, phylogenetically distinct members of the EDG 
cluster - EDG-2, EDG-4 and EDG-7 - are receptors for the 
similar but distinct ligand lysophosphatidic acid (LPA) 
[12-14]. Neuromedin U, a potent neuropeptide that causes 
contraction of smooth muscle, was correctly predicted 
phylogenetically to be the ligand of the orphan GPCR FM3 
(NMUR) [15]. Not only the ligand, but also the pharmacol- 
ogy of a novel receptor for histamine, was predicted and con- 
firmed through phylogeny [16]. GPR86, related to the ADP 
receptor P2Y12, was similarly recently shown to bind ADP 
[17], and UDP-glucose, a molecule involved in carbohydrate 
biosynthesis, was shown to be the ligand for the related 
receptor KIAAoooi [18]. 

Mammalian GPCRs were previously classified by phylogeny 
into three families [19,20]: the rhodopsin receptor-like 
family (A), the secretin receptor-Uke receptor family (B) and 
the metabotropic glutamate receptor family (C). These 
results were generated by neighbor joining, a fast distance- 
based method suited for large datasets, but influenced by 
methodological flaws that can in part be overcome by 
methods not generally applied previously. 

In this work, we compiled an exhaustive list that includes all 
available synonyms and accession numbers of 196 human 
GPCRs with known ligands and 84 human orphan receptors. 
The 241 sequences belonging to family A were aligned, and a 
tentative tree constructed by neighbor joining with 1,000 
bootstrap steps. Subgroups of family A defined by this tree 
and sequences from families B and C were then used for 
more accurate phylogenetic analysis by state-of-the-art tech- 
niques. From this analysis, we tried to predict possible 
ligands for orphan receptors. 



Results and discussion 

We set out to define the phylogenetic relationship of human 
GPCRs by state-of-the-art tools, assuming that the identifi- 
cation of cognate ligands of orphan receptors will be facili- 
tated by a more complete knowledge of their relationship 
within the large and diverse superfamily. 

Database mining and multiple sequence alignment 

Most receptors were identified by different groups; there- 
fore, many confusing names and synonyms exist. We 
adhered to SWISS-PROT names where possible, and com- 
piled a list including all available synonyms and accession 
numbers of 196 human GPCRs with known ligands and 84 
human orphan receptors (Table 1 shows all receptors men- 
tioned in this work; the complete list is supplied as an addi- 
tional data file with the online version of this paper). 
Gustatory and olfactory receptors were omitted. Multiple 
protein sequences were aligned and the extremely variable 
amino termini upstream of the first transmembrane domain 
and carboxyl termini downstream of the seventh transmem- 
brane domain were deleted to avoid length heterogeneity 
(see Figure 1). The deleted regions contained no significant 
sequence conservation. 

Phylogenetic analysis 

Because of the large number of sequences in fan[uly A, we 
had to use a combination of computational methods to 
accomplish the best possible description of their phyloge- 
netic relationship. In a first step we used the distance-based 
neighbor-joining method as the only one computationally 
feasible. Neighbor joining has been shown to be efficient at 
recovering the correct tree topology [21], but is greatly influ- 
enced by methodological errors, for example, the sampling 
error [22]. This can in part be overcome by bootstrapping, a 
method of testing the reliability of a dataset by the creation 
of pseudoreplicate datasets by resampling. Bootstrapping 
assesses whether stochastic effects have influenced the dis- 
tribution of amino acids [23], In previous publications on 
this topic, bootstrapping has not been generally used. 

We generated a neighbor-joining tree of family-A sequences, 
and considered tree branches to be confirmed if they were 
found in more than 500 of 1,000 bootstrap steps (Figure 2). 
The same branching pattern was found by least squares 
(data not shown) as implemented in FITCH [24], but it was 
not possible to compute enough bootstrap steps with the 
equipment used. The remaining sequences of unconfirmed 
branches were then assigned to existing branches according 
to results obtained with the local alignment tool BLASTP 
(see Additional data files) [25] to account for similarities in 
parts of the sequences not sufficient for repeated global 
alignment. The p-value was used as a measure of similarity. 

As this strategy still left four subgroups too large for detailed 
analyses, we recalculated neighbor-joining trees and in 
some cases least-square trees of these sequences to create 
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Table I 

~ ^ I 

List of example receptor names, accession numbers and abbreviations 1 



Receptor Group Accession no. Names and synon/ms 

Human GPCR - Family A ' J 



ADMR 


A02 


OI52I8 


Adrenomedullln receptor, Am-R 




APJ 


A03 


P354I4 


Apelin receptor, Apj, Agtrl i 


* 


CMLI 


AOS 


Q997SS 


Chemokine receptor-ltke 1, Dez, Chemr23, Ch23, Cmklrl 




CML2 


A02 


Q99527 


Chemokine receptor-like 2, flow-induced endothelial G protein-coupled receptor, 










Feg-I, Gpr30, Cmkrl2, Dry! 2, Cepr 




Am 
MUZ 


Q 1 6570 


Duffy antigen, Fy glycoprotein, glycoprotein D, Gpfy, Fy, Gpd, Dare 




CL/vJ 1 




rZ 1453 


Endothelial differentiation, Sphlngosine 1 -phosphate receptor, Lp-BI 




EDG2 




W7Z633 


Endothelial differentiation, lysophosphatldic add receptor, Lp-AI, Vz^-I 




EDG3 


Ml 4 




Endothelial differentiation, lysosphlngolipid receptor* L43-B3 


! ■ 


EDG4 


M 1 J 


KIM /\f\ATM\ 


Endothelial differentiation, lysophosphatldic acid receptor. Lp-A2 


1 ^ . 


EDG5 


Ml J 


iNr_UlWZZ 1 


Endothelial differentiation, sphingolipid receptor, Lp-B2, H2 1 8, Agr 1 6 


L:.„Ji 


EDG6 


A 1 


MJUUU^/7 


Endothelial differentiation, lysosphlngolipid receptor, Lp-C 1 


i ■ J i 


EDG7 


A 1 1 
Ml J 


KID f\'>£.^QA 


Endothelial differentiation, lysophosphatldic acid receptor, Lp-A3 


; . , 


EDG8 


A 1 7 
M 1 J 


KID 1 il^?07 


Endothelial differentiation, sphlngosine 1 -phosphate receptor, Lp-B4 








T lOZov 


Endothelln B receptor-like protein-2, Etbrlp2, Ebp2, Cns2 




FSHR 


A in 

MIU 


rZ3745 


Follicle stimulating hormone receptor, Fsh-R, follitropin receptor 


1, 




AAA 
MVO 


KIM AAT'>*>9 

Nrl_0072zj 


G protein-coupled receptor 


' ■ 

i ' . 




AAQ 
Avis 


P4609I 


G protein-coupled receptor GprI 


J ■ ■ ■ ■ 
\'- 


\jrnj 


A 1 7 
Al J 


B it Z AO A 

r46089 


G protein-coupled receptor, Acca orphan receptor 


1 . 




A 1 7 

Al i 


P46095 


G protein-coupled receptor 6 




\Jrl\/ 


AA^ 
Av4 


r4oi45 


G protein-coupled receptor 7 




GPRII 


AA^ 


rSB 1 HQ 


G protein-coupled receptor 8 


■ 'j' 


GPR25 


AO 3 


NM_005298 


G protein-coupled receptor 25 




APR?? 


A 1 fl 
Al d 


KiiwI A 1 DOT 1 

Nrl_0 1 897 i 


G protein-coupled receptor 27, Srebl 






All 
Al A 


Ktlut AAC9AA 


G protein-coupled receptor, Gpry 






A 1 C 
Al 9 


KIM AAC^AI 


G protein-coupled receptor 35 




GPR37 


nU r 


KIM AAOA7 


G protein-coupled receptor 37, Endothelin receptor type B-IIke, CnsI 


GPR39 


AO 7 


t\A1 1 
WW i TfH 


G protein-coupled receptor Gpr39 


It-* 


GPR40 


Al 1 


O 1 dRS'y 
SJ 1 HCiA 


G protein-coupled receptor Gpr40 






GPR4I 


Al 1 
f\ 1 1 




G protein-coupled receptor Gpr4 1 , Hia-R 




GPR42 


Al i 


oi coo 

Wl 93Zr 


G protein-coupled receptor Gpr42 




GPR43 


Ai 1 


W 1 993a 


G protein-coupled receptor Gpr43 




GPR44 


AOS 


AML/Z 1 U99 


G protein-coupled receptor 44 


a 


GPR44 


AOS 


AAn7 incc 

#v\L#A 1 U99 


G protein-coupled receptor 44 


(0 

t/i 


GPR48 


A in 


KIM fWQAOfi 


G protein-coupled receptor 48 


Ui 

;5 




AID 


KIM nni&z7 


G protein-coupled receptor 49, Hg38, G protein-coupled receptor 67, Fex 


n 
:r 




Alfl 


^YTZ 1 9 


G protein-coupled receptor Gpr52 | 




GPR55 


AI5 


PIPI_VU900J 


G protein-coupled receptor 55 




GPR57 


AI7 


NM 014627 


w prmein-voupieu recepcor 9/ 




GPR58 


AI7 


NM.0 14626 


G protein-coupled receptor 58 




GPR6I 


AI8 


AF3 17652 


G protein-coupled receptor 61. 




GPR62 


AiS 


AF3 17653 


G protein-coupled receptor 62 




GPR63 


AIS 


AF3 17654 


G protein-coupled receptor 63 




GPR72 


A09 


NM_0 16540 


G protein-coupled receptor 72, jpOS 




GPR73 


A09 


AAE24084 


G protein-coupled receptor 73 




GPR75 


A09 


NM_006794 


G protein-coupled receptor 75 




GPR80 


All 


AF4III09 


G protein-coupled receptor 80 


I.. 


GPR8I 


All 


AF4IIII0 


G protein-coupled receptor 81 


it 


GPR85 


AIS 


NM_0 18970 


G protein-coupled receptor 85, Sreb2 




GPR86 


AI2 


NP_076403 


Adp receptor 


If 

rr 


GPR87 


AI2 


NM_0239I5 


G protein-coupled receptor 87 


> i'. 

\^ ■ 


GPR88 


AIS 


NM_022049 


G protein-coupled receptor 88 




GPR9I 


All 


NM_033050 


G protein-coupled receptor 91 
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Table I (continued) 



Receptor 


Group 


Accession No. 


Names & Synonyms 


GPRIOi 


AI8 


NM_05402I 


G protein*coupted receptor 101 


GPRI02 


AI7 


NM.053278 


G protein-coupled receptor 1 02 


CPRI03 


A06 


AF4IMI7 


G protein-coupled receptor 103 


GPRC 


AI3 


P47775 


Gar 17 


GPRF 


A03 


P49685 


GnrlS Boh 


GPRJ 


A09 


Q 15760 




GPRL 


AI8 


Q99679 




GPRM 


A06 


099680 


Gpr22 


GPRV 


Al 1 


000270 


Gpr3l 


GPRW 


A08 


075388 


Gpr32 


HM74 


Alt 


P490 1 9 


G ni*n^^ina>cminlAH foron^rtt* MmT^ 


KiOl 


Ai2 


QI539I 




LSHR 


AlO 




Lutropin-choriogonadotropic hormone receptor, Lh/Cg-R, Lsh*R. luteinizing hormone receptor, 








Lhcpr Lhrhr Lear 


MAS 


AOS 


P0420t 


Mas proto-oncogenOi Mas 1 


MLIA 


A09 


P48039 


Melatonin receptor Type la, Mel-la-R. Mtnria 


MLtB 


A09 




neiatontn receptor Type lb. Mel-rb-R, Mtnrib 


MLIX 


A09 


Q 13585 


I'neidvvinin-rcia.ceu recepcory ny, uprau 


MRG 


A08 


P354I0 


Mas-related G protein-coupled receptor 


KIMI IIP 


AU/ 


Arz/ZjoZ 


Neuromedin U receptor 1. Nmurl, Gpr66, Fm-3 


IN 1 l\ 1 


Am 




Neurotensin receptor Type 1. Nt-R-I, Ntsri, Ntrr 


IN 1 l\Z 






Neurotensin receptor Type 2. Nt-R-2, levocabastine-sensltlve neurotensin receptor, Ntr2 








recepior, iNisr jl 


NYia 


A09 


P25929 


iNcuropepuQc 1 receptor lype i, iNpyi-K, iNpyir, iNpyr, iNpyyi 


NY2R 


A09 


P49I46 


rNtsuropepiiae i receptor i ype z, iNpyz-rs, iNpyzr 


NY4R 


A09 


P5039I 


1 -tcui upcptluc i reuepLui i /pe "f, iNpy*f-rv, rancreauC rOiypeptiae reccptor 1 , TO 1 , rpy 


Npy4r 






P2Y5 


AI5 


P43657 


P2y purinoceotor 5. P2v5. Durlnereic r^c^ntar 5 P2i*vS Ahl 


P2Y7 


AOS 


Q 15722 


P2y DurinoceDtor 7. P2v7 Leukotriene B4 recentor t^hpmnarrrartanr rM-onmr fiba 1 








P2ry7. P2y7. Gprl6, CmkrII, Ub4r 


P2Y9 


AI5 


Q99677 


P2y purlnoceptor 9, P2y9, purinergic receptor 9, Gpr23, P2ry9 


P2YI0 


AI5 


AF000545 


Putative purinergic receptor P2yl0 


P2YI2 


AI2 


AF3 13449 


Adp receptor, Spl999 


PAFR 


AI2 


P25I05 


Platelet Activating Factor receptor, Paf-R. Ptafr 


PNR 


AIT 


AF02l8ia 


riiMcive neurocransniivccr receptor 


PSP24 


AI8 




nign-aiTinity lysopnospnatidic acid receptor homolog, Gpr45 


RDCf 


A02 


P25 1 06 


^ pruiein-coupieo recepior naci noinoiOK 


RE2 


AI8 


AF09 1 890 


vj pru^tsin-woupicQ reccpcor nex 


SALPR 


AOS 


NM_0I6568 


ii<ibW9i,auii <aiiu cuigfuvtsnsin-iiKe pepLiuc recepcory kOCslxoy 


SREB3 


AI8 


NM_0 18969 


i^upcr vwnaHsrvva recepcor expresseQ in Drain J 


TM7SFI 


AOI 


AF027826 


Putative seven pass transmembrane protein 


TSHR 


AlO 


PI6473 


Thyroid stimulating hormone receptor, thyrotropin receptor , Tsh-R 


^uman GPCR 


- Family B 






EMRI 


B 


Q 14246 


Cell surface glycoprotein emrl, Emrl hormone receptor 


EMR2 


B 


AFI 14491 


Egf-like module Emr2 


EMR3 


B 


AF239764 


Egf-like module-containing mucin-like receptor Emr3 


BAII 


B 


OI45i4 


Brain-specific anglogenesis Inhibitor 1 


BAI2 


B 


O6024I 


Brain-specific angiogenesis Inhibitor 2 


BAI3 


B 


060242 


Brain-specific angiogenesis inhibitor 3, Kiaa0550 


GPR56 


B 


NM„005682 


G protein-coupled receptor 56 


^uman GPCR ■ 


■ Family C 






GPRC5B 


C 


NM_0I6235 


G PROTEIN-COUPLED RECEPTOR. FAMILY C. GROUP 5. MEMBER B, GPRC5B 


GPRC5C 


C 


NM_0 18653 


G protein-coupled receptor, family C, group S, member C, GprcSc 


GPRC5D 


c 


NM.0 18654 


G protein-coupled receptor, femily C, group 5, member D, GprcSd 



A complete list Is supplied as additional data file. Orphan receptors are shown in bold. 
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subgroups Ai and 2, A4 and 5, An and 15 and A17 and 18. 
This approach finally resulted in 19 differently sized sub- 
groups of family A (Table 2) that were further subjected to 
the more reliable maximum-likelihood and quartet-puzzling 



algorithms. Maximum-likelihood approaches calculate the 
probability of the observed data assuming that it has evolved 
in accordance with a chosen evolutionary model. Phyloge- 
nies are then inferred by finding trees and parameters that 




Figure I 

An example multiple sequence alignment of seven receptors. Protein sequences of GPR87. KID I. GPR86, P2YI2. H963. GPR34 and PAFR belonging to 
subgroup 12 were aligned with ClustalX and modified by deleting the extremely variable amino termini upstream of the first transmembrane domain and 
carboxyl termini downstream of the seventh transmembrane domain as indicated. Identical amino-acld residues in all aligned sequences are shaded in 
black and similar residues in gray. Transmembrane (TM) domains Identified by the TMpred program are Indicated. 



6 Genome 6/o/ogy Vol 3 No 1 1 Joost and Methner 



-gupup 



■ GPR39 

rsf NTR2 



NMU2R 

— NMU1R 



1-33 SREB3 



JPR85 
PE24 



r - TAZH 




Figure 2 

Neighbor-joining tree of the rhodopsin receptor-like family A inferred from die multiple sequence alignment using PHYLIP 3.6. Support values for each 
Internal branch were obtained by 1,000 bootstrap steps, and are indicated. Palrwise distances were determined with PROTDIST and the JTT substitution 
frequency matrix. The tree was calculated with NEIGHBOR using standard parameters and rooted with the distant, though related, famlly-B receptor 
GPRC5B as the outgroup. The consensus tree of all bootstrapped sequences was obtained widi CONSENSE. Orphan receptors are shown In bold. Scale 
bar indicates the branch length of 100 substitutions per site. 
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Table 2 



Receptor subgroups derived from a combination of netghbor-Joining and BLASTP results 



Al 


A2 


A3 


A4 


A5 


A6 


A7 


AS 


A9 


AlO 


Al 1 


C3X! 


ADMR 


AG22 


GPR7 


GALR 


FFIR 


BRS3 


C3AR 


GPR72 


FSHR 


GPR40 


CKRI 


BONZO 


AG2R 


GPRS 


GAL5 


FF2R 


ETIR 


C5AR 


GPR73 


GPR48 


GPR4i 


CKR2 


CCRI 1 


AG2S 


OPRD 


GALT 


GASR 


ETBR 


C5L2 


(GPR75) 


GPR49 


GPR42 


CKR3 


CCR3 


APJ 


OPRK 


GPR54 


(GPR) 


ETBR-LP2 


CMLi 


GPRA 


LSHR 


GPR43 


CKR4 


CCR4 


BRBI 


OPRM 


GPRO 


GPRI03 


GHSR 


FMLI 


GPRJ 


TSHR 


GPR80 


CKRS 


CCR5 


BRB2 


OPRX 


P2Y7 


(GPRH) 


GPR37 


FML2 


MLIA 




GPRSI 






GrR25 


SSRI 


SALPR 


GRHR 


GPR38 


FMLR 


MLIB 




GPR82 


CKRX 


CKR7 


GPRF 


$SR2 


UR2R 


OXIR 


GPR39 


GPRI 


MLIX 




GPR9i 


CXCI 


CKR9 




SSR3 




OX2R 


GRPR 


GPR44 


NKIR 




GPRV 


(TM7SFI) 


CKRA 




SSR4 




OXYR 


NMBR 


GPRW 


NK2R 




HM74 




CML2 




SSR5 




VIAR 


NMUIR 


(MAS) 


NK3R 




P2UR 




(DUFF) 








VIBR 


NMU2R 


(MRG) 


NK4R 




P2YII 




IL8A 








V2R 


NTRI 




NYIR 




P2Y4 




IL8B 










NTR2 




NY2R 




P2Y6 




RDCI 










TRFR 




NY4R 




P2YR 


















NY5R 






AI2 


AI3 


AI4 


AI5 


AI6 


AI7 


AI8 


AI9 




B 


c 


GPR34 


ACTR 


PD2R 


EBi2 


OPSB 


5H2A 


AAIR 


SHI A 




BAN 


CASR 


GrRoo 


^D 1 D 

Co IK 


PE2I 


G2A 


OPSD 


5H2B 


AA2A 


5HIB 




BAI2 


GBRI 


CjrKo/ 


CozR 


PE22 


GPR35 


OPSG 


5H2C 


AA2B 


SHtD 




BAt3 


GBR2 






PE23 


GPR4 


OPSR 


5H6 


AA3R 


5HIE 




CALR 


GPRC5B 






Pc24 


GPR55 


OPSX 


AIAA 


ACMI 


5HIF 




CD97 


GPRC5C 






rrZR 


GPR65 


RGR 


AIAB 


ACM2 


5H5A 




CGRR 


GPRC5D 


DA CD 

PArR 


cDCj4 


DI'^D 

rl2R 


GPR68 




A IAD 


ACM3 


5H7 




CRFI 


MGRI 






TAOD 

i AZK 






A2AA 


ACM4 






CRF2 


liGR2 




tUuo 




^DDU 

urRH 




A2AB 


ACM5 






EMRI 


MGR3 












A2AC 


GPRiOi 






EMR2 


MGR4 




EDG8 




GPRK 






\art\Ai 








MGR5 












D 1 A D 
dI AK 


GPR52 






GIPR 


MGR6 




wrno 








D'\ A D 

d2AR 


GPR6i 






GLPR 


MGR7 












D? A D 

d3AR 


GrR62 






GLR 


MGR8 




riL.jR 




DA D") 




D2DR 


GPR63 






GPU 






MC4R 




PAR3 




D3DR 


GPR78 






GPR56 






MC5R 




THRR 




D4DR 


GPR84 






GRFR 






MSHR 








DADR 


GPR85 






PACR 














DBDR 


(GPR88) 






PTR2 














GPRt02 


GPRL 






PTRR 














GPR57 


HHIR 






SCRC 














GPR58 


PSP24 






VIPR 














HH2R 


RE2 






VIPS 














PNR 


SREB3 











Very distantly related receptors that are possibly not phylogenetlcally related are shown In brackets. Orphan receptors are shown In bold. 
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yield the highest Hkelihood. Maximum-likelihood approaches 
tend to outperform alternative methods such as parsimony 
or distance-based methods. The main advantage is the appli- 
cation of a well defined model of sequence evolution to a 
given dataset [26]. Maximum likelihood is the estimation 
method least affected by sampling error and tends to be 
robust to many violations of the assumptions in the evolu- 
tionary model. The methods are statistically well founded, 
evaluate different tree topologies and use all sequence infor- 
mation available [27,28]. Because of their smaller size, fami- 
lies B and C could be subjected to these methods without 
prior subgrouping. This resulted in 19 phylogenetic trees, 
comprising 241 receptors for family A (Figures 3-6), one 
tree from 23 sequences for family B and one tree from 14 
sequences for family C (Figure 7). Family-A trees were 
rooted with the human family-B receptor GPRC5B and fam- 
ilies B and C with family-A receptor 5H1A. The sequence 
used to root the tree (the outgroup) is supposed to be a 
distant, though related, sequence. In some of our groups, 
the phylogenetic trees could not be fully resolved. This could 
be due to either very similar or very distant sequences. In 
both cases the phylogenetic signal is too weak to resolve the 
tree [29]. Several receptors (for example, TM7SF1, DUFF, 
GPR, GPRM. GPR75. GPR88, MAS and MRG) were found to 
be only distantly related to other known receptors used in 




Figure 3 

Chemokine receptors (subgroups A I and A2). Phylogenetic trees of the 
subgroups were inferred using Puzzle 5.0 corrected by the JTT 
substitution frequency matrix. Quartet-puzzling support percentage 
values from 10,000 puzzling steps are shown. The scale bars indicate a 
maximum likelihood branch length of 0. 1 inferred substitutions per site. 
Orphan receptors are shaded. 



our analysis. A possible explanation could be the previously 
proposed convergent evolution of this large protein family, 
meaning that these receptors have acquired the compelling 
similarity in their overall structures as a result of functional 
need, not phylogenetic relationship. The lack of significant 
sequence similarity among the different GPCR families 
favors this assumption [30-32]. Other explanations for the 
lack of significant sequence similarities might be an extra- 
ordinary divergence (genetic drift) or technical problems of 
the sequence-analysis methods used in analyzing polytopic 
membrane proteins or large protein families [33], 

Receptor family A subgroups 

In contrast to the subfamilies presented in GPCRDB [34], a 
database widely used in the field, our grouping shows the 
orphan receptors within their respective subgroup and their 
relationship to receptors with known ligands. In addition, 
our method sometimes resulted in subgroups with members 
whose ligands belong to different substance classes. These 
results are discussed in more detail below. 

Chemokine receptors 

Groups Ai and A2 comprise the chemokine receptors 
(Figure 3). The chemokine ligand superfamily is defined by 
four conserved cysteines that form two disulfide bonds, and 
can be stnicturally subdivided into two major branches 
based on the spacing of the first cysteine pair. Chemokines 
in which these residues are adjacent form the CC subfamily 
(corresponding to the SWISS-PROT CKR nomenclature 
used here), and those separated by a single amino acid com- 
prise the CXC subfamily (here CCR and IL8R; for a review 
see [35]). We had to divide the whole subfamily into two 
groups to perform a detailed phylogenetic analysis. This sub- 
grouping produced the same dichotomy, as suggested by the 
two-ligand motifs, as another example of the parallel evolu- 
tion of receptors and Hgands. Similar results describing this 
parallel evolution were found previously using a different 
computational approach [36]. 

Group Ai mainly comprises the CC family. We hypothesize 
that the orphan receptor CKRX, which constitutes a separate 
branch related to CKRi, 2, 3 and 5, might also bind a CC 
ligand. In contrast, TM7SF1 in this group seems to be only 
distantly, if at all, related to family-A receptors. It was 
grouped according to BLASTP results, where a misleading 
local alignment of approximately 20 amino acids placed it in 
the vicinity of the chemokine receptors. Group A2 is more 
heterogeneous and comprises receptors for CC and CXC 
ligands, as well as an orphan receptor (ADMR) previously 
thought to bind the peptide adrenomeduUin. 
AdrenomeduUin has now been shown to bind a family-B 
receptor and is discussed fiirther below. The oiphan receptor 
RDCi in group A2 was first believed to be a receptor for 
vasointestinal peptide VIP [37], a notion not supported by 
phylogeny and later dismissed by experimental data [38]. 
Our results place it closer to the ADMR receptor than to the 
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Figure 5 

Nucleotide and lipid receptors (subgroups Al I-AI6). The scale bar indicates a maximum-likelihood branch length of 0.1 inferred substitutions per site. 
Orphan receptors are shaded. For method see Figure 2. 



typical chemokine receptors. CML2 is a typical, but distant, 
member of the chemokine receptor family. The DUFF recep- 
tor (the Duffy antigen) is also very distantly related and was 
only grouped into A2 by BLASTP results. 

Peptide receptors 

Group A3 consists of receptors for the small peptides 
angiotensin (8 amino acids), bradykinin (9 amino acids) and 
apelin (Figure 4). Four forms of apelin (12, 13, 17 and 36 
amino acids) have been described, but only those of 12 and 13 
amino acids bind in nanomolar concentrations [39]. The 
orphan receptors GPRF and GPR25 in this group are related 
as closely to the apelin receptor APJ as to the angiotensin or 
bradykinin receptors, and might also bind small peptides. 
GPRF acts as a co-receptor for the human immunodeficiency 
virus (HIV) [40], like the APJ receptor [41], which further 
hints at structural homology of the two ligands. Opioid and 
somatostatin receptors make up group A4. Both somatostatin 
and opioid peptides are derived from the processing of larger 
precursors. The somatostatins are cyclic peptides of 14 and 



28 amino acids. The opioid precursors preproenkephalm, 
preprodynorphin, prepro-opiomelanocortin and prepronoci- 
ceptin display a strikingly similar general organization and a 
conserved amino-terminal region that contains six cysteines, 
probably involved in disulfide bond formation. 

The processed neuropeptides, in contrast, are less similar to 
each other. It could be speculated that the receptors first 
bound the precursors themselves, and that the diversity 
derived from processing is evolutionarily new. Processing 
prepronociceptin gives rise to two evolutionarily conserved 
peptides besides orphanin FQ, the ligand for OPRX. It has 
not been reported whether these peptides bind to the orphan 
receptors GPRy and GPRS, which constitute a new branch 
related to the opioid receptors. 

In group As we find three receptors that bind the 30-amino- 
acid peptide galanin, and related to these the GPR54 recep- 
tor, which is activated by the 54-, 14-, and 13-amino-acid 
peptides derived from the product of KiSS-i, a metastasis 
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suppressor gene for melanoma cells. These kisspeptins all 
share a common RF-amide caboxyl terminus. Although only 
distantly related to each other, both GPRO (melanin-concen- 
trating hormone) and UR2R (urotensin II peptide) bind 
cyclic peptides originally isolated from fish. Similarly distant 
is the orphan receptor SALPR, which shares sequence simi- 
larity witli somatostatin (A4) and angiotensin (A3) recep- 
tors, but subgrouping of groups A4 and 5 by neighbor 
joining led to its placement in group 5. SALPR does not bind 
somatostatin or angiotensin ligands [42], but could bind 
another cyclic peptide. The P2Y7 receptor in group A5 does 
not bind nucleotides [43], as suggested by the name, but was 
published as a receptor for the lipid leukotriene B4 [44], a 
notion not supported by phylogeny. In addition, two new 
leukotriene receptors - CLTi and CLT2 - have been cloned 
and characterized during the preparation of this manuscript 
[4546] and were found to be unrelated to P2Y7. 

Group A6 is again composed solely of receptors for peptide 
ligands. The orphan receptor GPR103 is related to the 
neuropeptide FF receptors that bind two amidated mam- 
malian neuropeptides - NPAF (A-18-F-amide) and NPFF (F- 
8-F-amide), also known as morphine-modulating peptides. 
These peptides, which may also be the ligand for GPR103, 
are members of a large family of neuropeptides related to the 
molluscan cardioexcitatory neuropeptide (FMRF-amide, 
Phe-Met-Arg-Phe-amide). The orphan receptors GPRM and 
GPR in group A6 are most probably also peptide receptors, 
but are only very distantly related to the others and show no 



relationship to receptors with known ligands. Group A7 is 
also composed of receptors for peptide ligands: neuromedin, 
neurotensin, motilin, endothelin, bombesin and the releas- 
ing hormones for growth hormone and thyrotropin. GPR39 
might bind a small peptide ligand like the closely related 
neurotensin receptors NTRi and 2, which binds a 13-amino- 
acid peptide derived from a larger precursor protein. GPR37 
and ETBR-LP2 are related to each other and branch off the 
endothelin receptors that bind characteristic bicyclic pep- 
tides of 21 amino acids containing four cysteines linked by 
two disulfide bonds. 

Group A8 has two branches with receptors with known 
ligands. These receptors bind the structurally diverse but 
functionally related chemotactic substances AT-formyl- 
methionyl and the anaphylatoxic complement factors. The 
iST-formylmethionyl ligands are small hydrophilic peptides of 
bacterial origin, but recently a number of new peptide ago- 
nists have been identified that selectively activate the high- 
affinity fMLF receptor FPR and/or its low-affinity variant 
FPRLi. These agonists include peptide domains derived 
fi-om the envelope proteins of HIV type 1 and at least three 
amyloidogenic polypeptides, the human acute-phase protein 
serum amyloid A, the 42-amino-acid form of beta-amyloid 
peptide and a 21-amino-acid fragment of the human prion 
protein. Furthermore, a cleavage fragment of neutrophil 
granule-derived bactericidal cathelicidin, LL-37, is also a 
chemotactic agonist for FPRLi (for a review see [47]). The 
complement factors Csa and €53 are large but highly 
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Figure 7 

Families B and C of the G-protein-coupled receptors (GPRCs). Phylogenetic trees of families B and C were inferred using Puzzle 5.0 corrected by the 
JTT substitution frequency matrix. Quartet- puzzling support percentage values from 10.000 puzzling steps are shown. The scale bar indicates a maximum 
likelihood branch length of 0.1 inferred substitutions per site. Orphan receptors are shaded. 



hydrophilic proteins with a mainly alpha-helical structure 
held together by three disulfide bridges. Csa is rapidly 
desarginated to the less potent derivative C5adR74, which is 
the ligand for the C5L2 receptor. The orphan receptors 
GPRi, CMLi and GPR44 all cluster, and constitute a sepa- 
rate branch as distant as the other two branches. No predic- 
tion of the possible structure of the ligands for these 
receptors can be derived from this tree, but maybe they will 
function as chemotactic peptides. This could at least hint at 
leukocytes or inflamed tissue as a possible source for these 
ligands. The receptor GPRW constitutes its own branch, not 
as distant to the main group as the MAS oncogene product 
and the related receptor MRG, which are only very distantly 
related to the group. 

All receptors in group A9 with known ligands bind peptides, 
except for a side branch consisting of receptors for the bio- 
genic amine melatonin. The orphan receptor MLiX is closely 
related to melatonin receptors MLiA and B, but apparently 
does not bind melatonin [48]. GPR73 is related to the neuro- 
peptide Y (NPY) receptor NY2R which mainly binds the pan- 
creatic peptide YY of 36 amino acids, and these two are 
placed together on a branch distinct from the NPY receptors 
NY4R and NYiR. GPR73 does not bind the NPY ligand 
family [49], but possibly a sunilar lai^e peptide ligand. 
The orphan receptors GPR72 and GPRJ constitute a new 



subgroup that most probably bind related peptide ligands. 
GPR72 does not bind a NPY hgand [49]. GPR75 is only very 
distantly related to the whole A9 group. The receptors for the 
glycoprotein hormones thyroid-stimulating hormone (TSH), 
luteinizing hormone (LSH) and follicle-stimulating hormone 
(FSH) make up Group Aio. GPR48 and 49 are very similar 
in their overall structure, with long amino termini, but their 
relationship is also evident in the neighbor-joining tree con- 
structed from alignments without amino and carboxyl 
termini. It has been recently shown that these receptors 
mediate the action of relaxin, a peptide hormone of the 
insulin-like gi'owth factor family secreted by the corpus 
luteum during pregnancy [50]. 

Nucleotide and lipid receptors 

The receptors with known ligands in group All are the P2Y 
receptors, which bind pyrimidine as well as purine nucleotides 
(Figure 5). Several orphan receptors constitute new clusters. 
GPR80 and GPR91 are distantly related to each other and rel- 
atively close to the P2Y receptors. GPR80 is the closest relative 
of the newly identified CLT2 receptor for leukotrienes as 
judged by BLASTP results. GPR81, HM74 and GPRV and GPR 
40-43 belong to branches only distantly related to P2Y recep- 
tors. Within these potential new subfamilies, GPR41-43, 
GPR81 and HM74 are more closely related to each other than 
to GPR40 (for GPR41-43) and GPRV (for GPR81 and HM74). 
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In group Ai2, the platelet-activated receptor, a lipid receptor 
and receptors activated by nucleotides mingle, but are found 
on different side branches. The orphan receptor GPR87 is 
closely related to the receptor for UDP-glucose KIoi and to 
the ADP-binding receptors P2Y12 and GPR86. We assume 
that this receptor might also bind UDP-glucose or another 
modified nucleotide. GPR34 is distantly related to the 
platelet-activating factor (PAF) receptor; it was not activated 
by available lipid ligands [51], but might nevertheless bind a 
lipid ligand. Group A13 contains both peptide and lipid 
receptors but they make up different branches. The peptide 
branch binds peptides derived from the processing of pro- 
opiomelanocortin that gives rise to peptides of between 12 
and 36 amino acids. The EDG and cannabinoid receptors 
constitute clusters, and one cluster distinct from the other 
three consists of the orphan receptors GPR3, GPR6 and 
GPRC, which have been grouped closer to the lipid EDG 
receptors in the overall neighbor-joining tree (Figure 2). 
This information helped to identify a phospholipid ligand for 
GPRC (H. Chica Schaller, personal commimication). 

The receptors in group A14 all bind ligands derived ft*om 
arachidonic acid by the action of cyclooxygenase. These 
receptors for iipid-derived autacoids or prostanoids com- 
prise receptors for the prostaglandins and thromboxanes. 
There are no orphan receptors in this group. Group A15 is a 
very heterogenous group composed of receptors for the 
lipids sphingosylphosphorylcholine (SPC), lysophos- 
phatidylcholine (LPC) and psychosine, and receptors acti- 
vated by proteases. GPR4 and GPR68 both bind SPC, like 
the EDG receptor branch consisting of the EDGi, 3, 6 and 
8 receptors in A13, but are not closely related. Protease- 
activated receptors become activated by a part of the 
former amino tenninus cleaved by the protease. The new 
amino terminus then fimctions as a tethered ligand and 
activates the receptor. This can be mimicked by very small 
peptides derived from this ligand; such receptors should 
therefore rather resemble peptide receptors. The orphans 
P2Y5, P2y9 and P2Y10 receptors were not placed in group 
11 and 12 like most P2Y receptors, but in group A15, sup- 
porting the fact that they were misnamed, P2Y5 and P2Y9 
do not bind nucleotides [52,53], but this has not been 
shown yet for P2Y10. All other orphan receptors in this 
group, with the exception of GPR35 and GPR55 which 
cluster together, are as distantly related to each other as to 
the receptors with known ligands. Group A16 contains the 
opsins, receptors that are activated by isoprenoid ligands, 
and no orphan receptors. 

Biogenic amine receptors 

Some serotonin receptors and receptors for the biogenic 
amines adrenaline, dopamine and histamine are all placed 
on different branches in group A17 (Figure 6). An additional 
branch consists of the orphan receptors GPRi02, PNR, 
GPR57 and GPR58, which are as distantly related to the 
others as, for example, is the alpha-adrenergic receptor 



branch. PNR and GPR58 expressed in COS cells did not bind 
various serotonin receptor-specific Ugands [54]. Their 
ligands might be small molecules with similar properties. 
Group A18 is veiy heterogeneous and consists of receptors 
for the biogenic amines acetylcholine and adenosine, and the 
HHiR receptor for histamine, as well as many distantly 
related orphan GPCRs. GPR63 is closely related to the 
orphan receptor PSP24. The Xenopus laevis homolog of this 
receptor binds LPA [55]. GPRioi and RE2, GPRL and 
GPR52, and GPR61 and GPR62 constitute their own sub- 
groups. In particular, the SREB1-3 cluster (GPR85, GPR27 
and SREB3) makes up its own family, with only a distant 
relationship to other GPCRs in this group. No orphan recep- 
tors are found in group A19, which consists entirely of sero- 
tonin receptors distinct from those in A17. 

During the preparation of this manuscript several new 
family-A receptors that could not be fitted into our analysis 
were identified. These comprise 15 new receptors distinct 
from the classical biogenic amine receptors that apparently 
bind the trace amines tyramine, p-phenylethylamine, trypta- 
mine and octopamine [56]. In addition, a new subfamily of 
GPCRs related to the mas oncogene and uniquely expressed 
in small nociceptive sensory neurons were showTi to be the 
receptors for a number of enkephalin fragments [57]. 

Receptor families B and C 

Family B (Figure 7) was named after the secretin receptor. 
Yet proteins showing homology to this receptor make up 
only one of four distantly related subgroups. The receptors 
EMRi, EMR2 and EMR3, and the CD97 surface antigen, all 
have several epidermal growth factor (EGF)-like domains in 
the extracellular amino terminus. They constitute their own 
cluster only distantly related to the rest of the family. The 
same applies to the brain-specific angiogenesis inhibitor 
family BAI1-3. GPR56 was assigned to family B because it 
shows the typical signature [58], but is so far the only one of 
its kind. So far no non-protein ligand has been identified as 
a ligand for family-B receptors. Astonishingly, one family-B 
receptor, namely the CGRP receptor, requires coexpression 
with single transmembrane receptor activity-modifying 
proteins (RAMP1-3) for ligand binding and signal transduc- 
tion [59]. Coexpression of different RAMPs results in 
binding of different cyclic peptide ligands such as 
adrenomedullin, amylin or the calcitonin gene-related 
peptide (for a review see [60]). This could further compli- 
cate the identification of the cognate ligands for these 
family-B orphan receptors, but we assume that they will 
also bind large peptide ligands. In family C (Figure 7), the 
metabotropic glutamate receptors MGRl-8 bind the small 
molecule glutamate, the CASR receptor senses extracellular 
calcium concentration, and receptors GBR1-2 bind the 
small molecule gamma-amino butyric acid (GABA), 
GPRC5B, C and D constitute their own subgroup with no 
closer relationship to the other members, but might also 
bind small molecules. 
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Conclusions 

In this work, we calculated the phylogenetic distances of 277 
human GPCRs and show the relationship of orphan recep- 
tors to receptors for known ligands with support values for 
each branch. We then grouped orphan receptors and recep- 
tors with known ligands into 19 subgroups that sometimes 
differ from previous classifications. Three subgroups are 
composed of receptors for ligands that belong to different 
substance classes; for example, in group A12, lipid receptors 
and receptors activated by nucleotides mingle, and in groups 
A13 and A15, peptide and lipid receptors. In both subgroups 
the receptors binding ligands of different substance classes 
make up different branches. We hope that this approach 
proves valuable for identifying the natural ligands of oiphan 
receptors, as related receptors have previously been shown 
to have ligands with similar structural features. 

Materials and methods 
Sequence database mining 

A database search excluding olfactory and gustatory recep- 
tors identified the amino-acid sequences of 281 human 
GPCRs, Only sequences annotated as GPCRs in the following 
databases were used: NCBI [61], SWISS-PROT [62], EMBL 
[63] and GPCRDB [34,64]. Receptors \Wthout published 
ligands in PubMed [65] were defined as orphan GPCRs. 

Multiple sequence alignments 

Multiple protein sequences were aligned with ClustalX 1,81 
[66]. Pairwise alignment parameters were set as: slow/accu- 
rate alignment; gap opening penalty 10; gap extension 
penalty o.io; protein weight matrix BLOSUM 30. Multiple 
alignment parameters were set as: gap opening penalty 10; 
gap extension penalty 0.05; delay divergent sequences 35%; 
protein weight matrix BLOSUM series [67]. The alignments 
were modified by deleting the extremely variable amino 
termini upstream of the first transmembrane domain and 
carboxyl termini downstream of the seventh transmembrane 
domain. Alignment editing and shading was done using 
BioEdit Sequence Alignment Editor [68] and GeneDoc Mul- 
tiple Sequence Alignment Editor [69]. Transmembrane 
domains were identified using the TMpred program [70] 
and, where available, data from the original publication [71], 

Clustering of subgroups 

An overall phylogenetic tree of family A was inferred from 
the multiple sequence alignment with PHYLIP 3.6 [72]. 
Bootstrapping was performed 1,000 times using SEQBOOT 
to obtain support values for each internal branch, Painvise 
distances were determined with PROTDIST and the JTT 
substitution frequency matrix [73]. Neighbor-joining phylo- 
genetic trees [21] were calculated with NEIGHBOR using 
standard parameters. The human GPRC5B receptor belong- 
ing to family B was used as outgroup for family A. The out- 
group sequence is supposed to be a distant, though related, 
sequence and is used to root the tree. The majority-rule 



consensus trees of all bootstrapped sequences were obtained 
with the program CONSENSE. Representations of the calcu- 
lated trees were constructed with TreeView [74]. Clusters with 
bootstrap values greater than 50% were defined as confirmed 
subgroups, and sequences with lower values added to these 
subgroups according to their sequence similarity in the align- 
ment as judged by visual inspection and the results of pairwise 
local alignments with all other sequences by BLASTP [25]. The 
p-value was used as a measure of similarity. 

Quartet-puzzling trees 

Multiple protein sequence alignments of these new subgroups 
were created as described above. Phylogenetic trees were 
infeiTed ft-om these alignments using Puzzle 5.0 [75] to calcu- 
late maximum-likelihood distances corrected by the JTT sub- 
stitution-frequency matrix [73] with amino-acid usage 
estimated from the data, site-to-site rate variation modeled 
on a gamma distribution mth eight rate categories plus 
invariant sites, and the shape parameter estimated from the 
data. The human GPRC5B receptor of family B was used as an 
outgroup for family A. The human 5H1A receptor of family A 
was used as an outgroup for families B and C (the outgroups 
are not shown in the figures here). Quartet-puzzUng (QP) 
trees were constructed with the described settings and 10,000 
puzzling steps to obtain support values (QP reliability) for 
each internal branch. The program Puzzle 5.0 was used in a 
parallelized version (ppuzzle) with a message-passing inter- 
face (MPI) implementation on a HP 9000 N-Class Enterprise 
Server Cluster consisting of five HP 9000 N-Class shared- 
memory multiprocessor systems with eight PA-RISC 8600 
(552 MHz) processors each. Representations of the quartet- 
puzzling trees were constnicted with TreeView [74]. 

Additional data files 

Additional data files available with the online verson of this 
paper include a data table with names, synonyms and acces- 
sion numbers of all GPCRs, and the BLASTP results of all 
GPCRs (full-length sequences and sequences without amino 
or carboxyl termini). 
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The complexity of such signal-response systems, with multiple interacting 
relay chains of signaling proteins, is daunting. But recombinant DNA technology, 
combined with classical genetic analyses in Drosophila, the nematode C elegam, 
and yeasts, as well as more conventional biochemical and pharmacological 
methods, is rapidly uncovering the intricate details of these mechanisms by 
which activated receptor proteins change the behavior of the cell. 

Summary 

Each cell in a multicellular animal is programmed during development to respond 
to a specific set of signals that act in various combinations to regulate the behavior 
of the cell and to determine whether the cell lives or dies and whether it proliferates 
or stays quiescent Most of these signals mediate paracrine signaling, in which local 
mediators are rapidly taken up, destroyed, or immobilized, so that they act only on 
neighboring cells. In addition, centralized control is exerted both by endocrine sig- 
naling, in which hormones secreted by endocrine cells are carried in the blood to 
target cells throughout the body, and by synaptic signaling, in which neurdtrunsmit- 
ters secreted by nerve cells act locally on the postsynaptic cells that their axons con- 
tact. 

Cell signaling requires both extracellular signaling molecules and a complemen- 
tary set of receptor proteins in each cell that enable it to bind and respond to them 
in a programmed and characteristic way. Some small hydrophobic signaling mol- 
ecules, including the steroid and thyroid hormones and the retinoids, diffuse across 
the plasma membrane of the target cell and activate intracellular receptor proteins, 
which directly regulate the transcription of specific genes. Some dissolved gases, such 
as nitric oxide and carbon monoxide, act as local mediators by diffusing across the 
plasma membrane of the target cell and activating an intracellular enzyme^usu- 
ally guanyfyl cyclase, which produces cyclic GMP in the target cell But most extra- / 
cellular signaling molecules are hydrophilic and are able to activate receptor proteins 
only on the surface of the target cell; these receptors act as signal transducers, con- 
verting the extracellular binding event into intracellular signals that alter the behav- 
ior of the target cell. There are three main families of cell-surface receptors, each of 
which transduces extracelhdar signals in a different way. lon-channel-linked recep- 
tors are transmitter-gated ion channels that open or close briefly in response to the 
binding of a neurotransmitter. G-protein-linked receptors indirectly activate or in- 
activate plasma-membrane bound enzymes or ion channels via trimeric GTP bind- 
ing proteins (G proteins). Enzyme-linked receptors either act directly as enzymes or 
are associated with enzymes; the enzymes are usually protein kinases that phospho- 
rylate specific proteins in the target cell Through cascades of highly regulated pro- 
tein phosphorylations, elaborate sets of interacting proteins relay most signals from 
the cell surface to the nucleus, thereby altering the cell's pattern of gene expression 
and, as a consequence, its behavior. Cross-talk between different signaling cascades 
enables a cell to integrate information from the multiple signals that it receives. 



Signaling via G-Protein-linlced 
Cell-Surface Receptors " 

G-protein-linked receptors are the largest family of cell-surface receptors. More 
than 100 members have already been defined in mammals. Many of these have 
been identified by homology cloning, in which low stringency hybridization with 
existing cDNA probes is used to detect related DNA sequences (see Figure 7-17). 
Other family members have been found by expression cloning, using their ligand- 
binding or cell-activation properties to identify them. In one form of this ap- 
proach, a library of cDNA molecules prepared from cells or tissues that express 
the receptor are copied into RNA molecules, which are then injected into Xenopu^ 
oocytes. The oocytes translate the RNA molecules into proteins. These proteins 



734 Chapter 15 : Cell Signaling 



jte inserted into the plasma membrane, where their ligand-bindlng or ceU-ac- 
tivation properties allow them to be detected. 

G-proteiri-linked receptors mediate the cellular responses to an enormous 
diversity of signaling molecules, including hormones, neurotransmitters, and 
local mediators, which are as varied in structure as they are in function: the list 
tficliides proteins and small peptides, as well as amino acid and fatty acid deriva- 
,jws. The same ligand can activate many different famUy members. At least 9 
ilisiinct G-pfotem-hnked receptors are activated by adrenaline, for example, 
another 5 or more by acetylcholine, and at least 15 by serotonin. 

Despite the chemical and functional diversity of the signaling molecules that 
bind to them, all of the G-protein-linked receptors whose amino acid sequences 
ate known from DNA sequencing studies have a similar structure and are almost 
certainly evolutionarily related. They consist of a single polypeptide chain that 
threads back and forth across the lipid bilayer seven times (Figure 15-17) As we 
jBscuss later, this superfamily of seven-pass transmembrane receptor proteins 
includes rhodopsm, the light-activated protein in the vertebrate eye, as weU as 
olfactory receptors in the vertebrate nose. Other «amUy members are found in 
unicellular organisms: the receptors in yeasts that recognize the yeast mating 
factors are an example. This ancient structural motif is also shared by 
bscteriorhodopsm, a bacterial Ught-activated H* pump discussed in Chapter 10 
ttllhough, unlike the other family members, bacteriorhodopsin is not a receptor 
and does not act via a G protein. Taken together, these findings suggest that the 
frproiein-linked receptors that mediate cell-cell signaling in multicellular organ- 
.m may have evolved from sensory receptors possessed by their unicellular 
mccstors. 1 he members of this receptor family have conserved not only their 
amino acid sequence but also their functional relationship to G proteins by 
means of which they broadcast into the interior of the cell die message that an 
eflracellular hgand is present. It is the intracellular sequence of events beginning 
with the activation of G proteins that mainly concern us in this section 



f Trimeric G Proteins Relay the Intracellular Signal 
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ili^ntlrr".?^''"'''"'''"^ P™**''"' P'°**'"''^ functionally couple these 
; S .. ? ^TMymes or ion channels in the plasma membrane are 
: ^Trro f^T''' ^^^^ single-chain GTP-binding proteins (called mo«o- 

«»i^c ^ °' '""""'"^'-^ GTPases) that help relay intracellular 

. and regulate vesicular traffic and many odier processes in eucaryotic cells 

c& bT."'. ^^""'f ^J^^Pt^^ *n other 

S . . ^""^^ GTP-binding proteins, however, are GTPases and fiinc- 
b2 '^T.''"'^' switches that can flip between two states: active, when GTP is 
CtZ r^^^^' '^""'e^ "sually means 

iracelll^ r "I'V^.''*! ^ *° ^'^''^^ in the cell. When an ex- 

confZ/. If"""* ^ G-protein-linked receptor, the receptor changes its 

: f*yca JntS" switches on the trimeric G proteins that associate with it 
ivA^t *° ''^^''^ ^"'^ '■^P'^'^^ "^^^ GTP. The switch is turned 

«ut befor 1 ' hydrolyzes its own bound GTP, converting it back to GDP. 

^ the ,v. ^"•''^ P'"*^'" ^" opportunity to diffuse away 
''f^sm taJget 'message for a prolonged period to its down- 

f<i£l?,'^'T''''^"^^'^ receptors activate a chain of events that alters the 
"'olecuir, I! "® °' '""'■^ intracellular signaling molecules. These small 
'"ossenZi *° intracellular mediators (also called intracellular 

•"•vior of ; ?' '"''T'^ rnessengers), in turn pass the signal on by altering the be- 
'""''fiator;! P'"'^'"^- "'"^^ »^*<^ely used intracellular 

'"■"lulatS J':^"'^^ ^'^''^ ^"'^ 'heir concentrations are 

"■'•«PtOr, ' , ""'^^ pathways in most animal cells, and most G-protein-linked 
legulate one or the other of them, as outlined in Figure 15-18. 
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Figure 15-17 A schematic drawing of 
a G-proteln-llnked receptor. 

Receptors that bind protein ligands 
have a large extracellular ligand- 
bindlng domain formed by the part of 
the polypeptide chain shown in light 
green. Receptors for small ligands 
such as adrenaline have small 
extracellular domains, and the ligand- 
binding site is usually deep within the 
plane of the membrane, formed by 
amino acids from several of the 
transmembrane segments. The parts 
of the intracellular domains that are 
mainly responsible for binding to 
trimeric G proteins are shown in 
orange, while those that become 
phosphorylated dudng receptor 
desensitization (discussed later) are 
shown in red. 
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